I/O Characterization of Big Data Workloads in Data Centers

نویسندگان

  • Fengfeng Pan
  • Yinliang Yue
  • Jin Xiong
  • Daxiang Hao
چکیده

As the amount of data explodes rapidly, more and more organizations tend to use data centers to make effective decisions and gain a competitive edge. Big data applications have gradually dominated the data centers’ workloads, and hence it has been increasingly important to understand their behaviour in order to further improve the performance of data centers. Due to the constantly increased gap between I/O devices and CPUs, I/O performance dominates the overall system performance, so characterizing I/O behaviour of big data workloads is important and imperative. In this paper, we select four typical big data workloads in broader areas from the BigDataBench which is a big data bench-mark suite from internet services. They are Aggregation, TeraSort, Kmeans and PageRank. We conduct detailed deep analysis of their I/O characteristics, including disk read/write bandwidth, I/O devices’ utilization, average waiting time of I/O requests, and average size of I/O requests, which act as a guide to design highperformance, low-power and cost-aware big data storage systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Performance Study of Big Data on Small Nodes

The continuous increase in volume, variety and velocity of Big Data exposes datacenter resource scaling to an energy utilization problem. Traditionally, datacenters employ x8664 (big) server nodes with power usage of tens to hundreds of Watts. But lately, low-power (small) systems originally developed for mobile devices have seen significant improvements in performance. These improvements could...

متن کامل

Understanding Big Data Analytic Workloads on Modern Processors

Big data analytics applications play a significant role in data centers, and hence it has become increasingly important to understand their behaviors in order to further improve the performance of data center computer systems, in which characterizing representative workloads is a key practical problem. In this paper, after investigating three most important application domains in terms of page ...

متن کامل

Understanding Vertical Scalability of I/O Virtualization for MapReduce Workloads: Challenges and Opportunities

As the explosion of data sizes continues to push the limits of our abilities to efficiently store and process big data, next generation big data systems face multiple challenges. One such important challenge relates to the limited scalability of I/O, a determining factor in the overall performance of big data applications. Although paradigms like MapReduce have long been used to take advantage ...

متن کامل

Zone-based data striping for cloud storage

Data centers in the Bcloud[ will need to support a wide range of applications, each with their own input/output (I/O) requirements. Some applications perform small and random I/O operations, whereas others demand high streaming bandwidth. In addition, in order to reduce costs, cloud data centers will contain thousands of commodity servers and network switches. Delivering high performance with u...

متن کامل

Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014